Toggle navigation
Home
About
About Journal
Historical Evolution
Indexed In
Awards
Reference Index
Editorial Board
Journal Online
Archive
Project Articles
Most Download Articles
Most Read Articles
Instruction
Contribution Column
Author Guidelines
Template
FAQ
Copyright Agreement
Expenses
Academic Integrity
Contact
Contact Us
Location Map
Subscription
Advertisement
中文
Journals
Publication Years
Keywords
Search within results
(((ZHANG Lianhai[Author]) AND 1[Journal]) AND year[Order])
AND
OR
NOT
Title
Author
Institution
Keyword
Abstract
PACS
DOI
Please wait a minute...
For Selected:
Download Citations
EndNote
Ris
BibTeX
Toggle Thumbnails
Select
End-to-end speech synthesis based on WaveNet
QIU Zeyu, QU Dan, ZHANG Lianhai
Journal of Computer Applications 2019, 39 (
5
): 1325-1329. DOI:
10.11772/j.issn.1001-9081.2018102131
Abstract
(
1089
)
PDF
(819KB)(
576
)
Knowledge map
Save
Griffin-Lim algorithm is widely used in end-to-end speech synthesis with phase estimation, which always produces obviously artificial speech with low fidelity. Aiming at this problem, a system for end-to-end speech synthesis based on WaveNet network architecture was proposed. Based on Seq2Seq (Sequence-to-Sequence) structure, firstly the input text was converted into a one-hot vector, then, the attention mechanism was introduced to obtain a Mel spectrogram, finally WaveNet network was used to reconstruct phase information to generate time-domain waveform samples from the Mel spectrogram features. Aiming at English and Chinese, the proposed method achieves a Mean Opinion Score (MOS) of 3.31 on LJSpeech-1.0 corpus and 3.02 on THchs-30 corpus, which outperforms the end-to-end systems based on Griffin-Lim algorithm and parametric systems in terms of naturalness.
Reference
|
Related Articles
|
Metrics
Select
Acoustic modeling approach of multi-stream feature incorporated convolutional neural network for low-resource speech recognition
QIN Chuxiong, ZHANG Lianhai
Journal of Computer Applications 2016, 36 (
9
): 2609-2615. DOI:
10.11772/j.issn.1001-9081.2016.09.2609
Abstract
(
636
)
PDF
(1145KB)(
371
)
Knowledge map
Save
Aiming at solving the problem of insufficient training of Convolutional Neural Network (CNN) acoustic modeling parameters under the low-resource training data condition in speech recognition tasks, a method for improving CNN acoustic modeling performance in low-resource speech recognition was proposed by utilizing multi-stream features. Firstly, in order to make use of enough acoustic information of features from limited data to build acoustic model, multiple features of low-resource data were extracted from training data. Secondly, convolutional subnetworks were built for each type of features to form a parallel structure, and to regularize distributions of multiple features. Then, some fully connected layers were added above the parallel convolutional subnetworks to incorporate multi-stream features, and to form a new CNN acoustic model. Finally, a low-resource speech recognition system was built based on this acoustic model. Experimental results show that parallel convolutional subnetworks normalize different feature spaces more similar, and it gains 3.27% and 2.08% recognition accuracy improvement respectively compared with traditional multi-feature splicing training approach and baseline CNN system. Furthermore, when multilingual training is introduced, the proposed method is still applicable, and the recognition accuracy is improved by 5.73% and 4.57% respectively.
Reference
|
Related Articles
|
Metrics